Skip to content

refactor(backend): move revocation versionComment from dedicated column to metadata#6198

Draft
theosanderson-agent wants to merge 3 commits intomainfrom
refactor/version-comment-to-metadata
Draft

refactor(backend): move revocation versionComment from dedicated column to metadata#6198
theosanderson-agent wants to merge 3 commits intomainfrom
refactor/version-comment-to-metadata

Conversation

@theosanderson-agent
Copy link
Collaborator

@theosanderson-agent theosanderson-agent commented Mar 26, 2026

Summary

This PR unifies how versionComment is stored for revisions and revocations, addressing #3135.

Problem: Previously, versionComment was stored differently depending on the type of version entry:

  • For revisions, it was stored as a metadata field in originalData, processed through the preprocessing pipeline, and ended up in processedData.metadata
  • For revocations, it was stored in a dedicated version_comment column on the sequence_entries table, then manually merged into released data via special-case code in ReleasedDataModel

This inconsistency made the codebase harder to understand and maintain.

Solution: Following the approach suggested in #3135, this PR:

  1. Stores revocation versionComment in originalData.metadata — When creating a revocation with a version comment, the comment is stored in the originalData JSONB column as a metadata entry, compressed using the standard compression service

  2. Automatically constructs processedData for revocations in the database view — The sequence_entries_view is updated to build joint_metadata for revocations directly from originalData.metadata. This approach is pipeline-version-independent, meaning the versionComment survives processing pipeline version changes without needing to reprocess revocations

  3. Removes the dedicated version_comment column — A migration (V1.26) copies existing version_comment values into originalData.metadata for all affected revocations, then drops the column

  4. Removes special-case code — The versionComment field is removed from RawProcessedData, and the conditional metadata merging in ReleasedDataModel is removed. The versionComment now flows through the same path for both revisions and revocations

Files changed

  • Migration V1.26: Migrates existing data and recreates the view
  • SubmissionDatabaseService.kt: revoke() now stores versionComment in originalData; streamReleasedSubmissions() no longer reads versionCommentColumn; RawProcessedData no longer has versionComment field
  • SequenceEntriesTable.kt / SequenceEntriesView.kt: Remove versionCommentColumn
  • ReleasedDataModel.kt: Remove conditional versionComment addition for revocations
  • EarliestReleaseDateFinderTest.kt: Remove versionComment field from test data

Key design decision

Rather than inserting into sequence_entries_preprocessed_data for revocations (which would be tied to a specific pipeline version and lost when the pipeline version changes), the view constructs joint_metadata directly from originalData for revocations. This means the versionComment is always available regardless of pipeline version changes, without needing to reprocess revocations.

Test plan

  • Verify compilation passes (confirmed locally)
  • Verify lint passes (confirmed locally)
  • Run full backend test suite (requires PostgreSQL infrastructure)
  • Verify revocation with versionComment shows correctly in released data on preview
image
  • Verify revocation without versionComment still works
image

(currently on main we also see None if this is empty - I think this is probably an issue with the website I assume it submits revocations with an empty string and not None, this isnt too important atm)

  • Verify revision versionComment is unaffected
image
  • Verify migration correctly moves existing version_comment values

🤖 Generated with Claude Code

🚀 Preview: https://refactor-version-comment.loculus.org

…mn to metadata

Store revocation versionComment in originalData metadata instead of the
dedicated version_comment column, unifying how versionComment is handled
for both revisions and revocations.

- When creating a revocation, store versionComment in originalData.metadata
  and compress using the compression service
- Update the database view to automatically construct joint_metadata for
  revocations from originalData, making it pipeline-version-independent
- Remove versionCommentColumn from SequenceEntriesTable and SequenceEntriesView
- Remove versionComment field from RawProcessedData (now comes through
  processedData metadata via the view)
- Remove conditional versionComment addition in ReleasedDataModel
- Add migration V1.26 to move existing version_comment values into
  originalData metadata and drop the column

Closes #3135

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude claude bot added the backend related to the loculus backend component label Mar 26, 2026
@anna-parker anna-parker added update_db_schema preview Triggers a deployment to argocd labels Mar 26, 2026
@theosanderson
Copy link
Member

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30a12fcd2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 168 to 170
conditionalMetadata(
earliestReleaseDate != null,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep revocation versionComment out of schema filtering

Removing the dedicated revocation versionComment merge here makes the value depend entirely on processedData.metadata, but that path is normalized by ProcessedMetadataPostprocessor.filterOutExtraFieldsAndAddNulls, which drops metadata keys not present in the organism schema. In environments/configs where versionComment is not part of schema metadata (for example the backend test configs), revocation comments will silently disappear from released data after this change. Previously this code guaranteed the field by injecting it after postprocessing.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component preview Triggers a deployment to argocd update_db_schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants